Accumulation of driver and passenger mutations during tumor progression 
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Major efforts to sequence cancer genomes are now occurring throughout the world[TJ[5]. Though 
the emerging data from these studies are illuminating, their reconciliation with epidemiologic and 
clinical observations poses a major challenge 3 . In the current study, we provide a novel mathe- 
matical model that begins to address this challenge. We model tumors as a discrete time branching 
process [4j that starts with a single driver mutation and proceeds as each new driver mutation leads 
to a slightly increased rate of clonal expansion. Using the model, we observe tremendous variation 
in the rate of tumor development - providing an understanding of the heterogeneity in tumor sizes 
and development times that have been observed by epidemiologists and clinicians. Furthermore, 
the model provides a simple formula for the number of driver mutations as a function of the total 
number of mutations in the tumor. Finally, when applied to recent experimental data, the model 
allows us to calculate, for the first time, the actual selective advantage provided by typical somatic 
mutations in human tumors in situ. This selective advantage is surprisingly small, 0.005 ± 0.0005, 
and has major implications for experimental cancer research. 



It is now well-accepted that virtually all cancers re- 
sult from the accumulated mutations in genes that in- 
crease the fitness of a tumor cell over that of the cells 
that surround it[5J[5]. As a result of advances in technol- 
ogy and bioinformatics, it has recently become possible 
to determine the entire compendium of mutant genes in 
a tumor [7, 8^ ^ IQi |34l [S^ . Studies to date have revealed 
a complex genome, with ^ 40 — 80 amino-acid changing 
mutations present in a typical solid tumor [HI [TOJ IMl I36j . 
For low frequency mutations, it is difficult to distinguish 
"driver mutations" ~ defined as those that confer a se- 
lective growth advantage to the cell - from "passenger 
mutations" [51 [Ol [Tl] . Passenger mutations are defined 
as those which do not alter fitness but occurred in a cell 
that coincidentally or subsequently acquired a driver mu- 
tation, and are therefore found in every cell with that 
driver mutation. It is believed that only a small fraction 
of the total mutations in a tumor are driver mutations, 
but new, quantitative models are clearly needed to help 
interpret the significance of the mutational data and to 
put them into the perspective of other lines of cancer 
research investigation. 

In most previous models of tumor evolution, mutations 
accumulate in cell populations of constant size |15l[TFlll7j 
or of variable size, but the models take into account only 
one or two mutations [IHl [HI UHl IHl In our new 

model, we assume that each new driver mutation leads 
to a slightly faster tumor growth rate. This model is as 
simple as possible as the analytical results depend on only 
three parameters: the average driver mutation rate, the 



average selective advantage associated with driver muta- 
tions, and the average cell division time. 

Tumors are initiated by the first genetic alteration that 
provides a relative fitness advantage. In the case of typi- 
cal leukemias, this would represent the first alteration of 
an oncogene, such as a translocation between BCR and 
ABL. In the case of solid tumors, the mutation that ini- 
tiated the process might actually be the second "hit in 
a tumor suppressor gene - the first hit affects one allele, 
without causing a growth change, while the second hit, in 
the opposite allele, leaves the cell without any functional 
suppressor, in accord with the two-hit hypothesis [23 . It 
is important to point out that we are modeling tumor 
progression, not initiation jlSI [TB] , because progression is 
rate-limiting for cancer mortality - it generally requires 
three or more decades for metastatic cancers to develop 
from initiated cells in humans. 

Our first goal is to characterize the times at which suc- 
cessive driver mutations arise in a tumor of growing size. 
We have employed a discrete time branching process in 
this model because it makes the numerical simulations 
feasible. In a discrete time process, all cell divisions are 
synchronized. We present analytic formulas for this dis- 
crete time branching process and analogous formulas for 
the continuous time case (in AppendixjC]) when possible. 
At each time step, a cell can either divide or differentiate, 
senesce, or die. In the context of tumor expansion, there 
is no difference between differentiation, death, and senes- 
cence, as none of these processes will result in a greater 
number of tumor cells than present prior to that time 
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FIG. 1: Variability in tumor progression. Number of 
cells with a given number of driver mutations versus the age 
of the tumor. Six different realizations of the same stochastic 
process with the same parameter values are shown, corre- 
sponding to tumor growth in six 'patients'. The process is 
initiated with a single surviving founder cell with one driver 
mutation. The times at which subsequent driver mutations 
arose varied widely among 'patients'. After initial stochas- 
tic fluctuations, each new mutant lineage grew exponentially. 
The overall dynamics of tumor growth are greatly affected by 
the random time of the appearance of new mutants with sur- 
viving lineages. Parameter values: mutation rate u = fO^^, 
selective advantage s — 0.5% and generation time T = 3 days. 



step. We assume that driver mutations reduce the prob- 
ability that the cell will take this second course, i.e., that 
it will differentiate, die, or senesce, henceforth grouped 
as "stagnate". A cell with k driver mutations therefore 
has a stagnation probability dk = ^(l — s)*^. The division 
probability is — I ~ d^. . The parameter s characterizes 
the (average) selective advantage of each driver mutation 
that occurs following the initiating mutation. 

When a cell divides, one of the daughter cells can re- 
ceive an additional driver mutation with probability u. 
The point mutation rate per base per cell division is 
~ 5 X 10^^". There is a finite number of tumor suppres- 
sor genes and oncogenes that can be mutated at each 
time point and lead to driver mutations. We conserva- 
tively estimate that there are ^100 tumor suppressor 
genes and ^100 oncogenes in a human cell, and that on 
average each tumor suppressor gene can be inactivated 
by mutation at ^ 200 positions and each oncogene can 
be activated in ~ 10 positions. There are thus a total 
of ~ 21,000 positions in the genome that could become 
driver mutations. As the rate of chromosome loss in tu- 
mors is much higher than the rate of point mutation|15j. 



a single point mutation is rate limiting for inactivation 
of tumor suppressor genes. The driver mutation rate is 
therefore ~ 10"^ per cell division (« 21000 x 5 x 10"^°). 
Our theory can accommodate any realistic mutation rate 
and the major numerical results are only weakly affected 
by varying the mutation rate within a reasonable range. 

Experimental evidence suggests that tumor cells di- 
vide about once every three days in glioblastoma 
multiforme |24j and once every four days in colorectal 
cancers [25j. Incorporating these division times into the 
simulations provided by our model leads to the dramatic 
results presented in Fig. 1. Though the same parame- 
ter values - u = 10~^ and s — 0.5% - were used for 
each simulation, there was an enormous variation in the 
times required for disease progression. For example, in 
'Patient 1', the second driver mutation had not occurred 
within 20 years following tumor initiation and the size 
of the tumor remained small (micrograms, representing 
< lO'^ cells). In contrast, in 'Patient 6', the second driver 
mutation occurred after only two years and by 20 years 
the tumor would weigh kilograms (10^^ cells), with the 
most common cell types in the tumor having three or 
four driver mutations. 'Patients 2 to 5' had progression 
rates between these two extreme cases. 

The results in Fig. 1 provide insights into the nature 
of tumor development in patients with familial adenoma- 
tous polyposis (FAP)^26J- If untreated (by colectomy), 
these patients develop adenomas while teenagers, but do 
not develop cancers until their fourth or fifth decades of 
life, by which time there are thousands of tumors per 
patient. Each of these thousands of tumors represents 
a process that is roughly similar to one of the processes 
in Fig. 1. Hence it is likely that at least one of them 
will progress relatively fast, as in 'Patient 6' in Fig. 1. 
A patient without FAP who has initiated one polyp has 
only a ^ 0.001 probability of having a cell in that polyp 
that has accumulated ten driver mutations, even after 20 
years. On the other hand, a patient that has initiated 
1000 polyps has a 0.63 probability of having a cell in 
at least one polyp with ten driver mutations during the 
same time period. This quantitatively explains (i) why 
the polyps in an individual patient with FAP are so het- 
erogeneous in size and (ii) why cancers are so much more 
common in FAP patients than in the general population, 
despite the fact that the inherited mutation affects only 
tumor initiation and does not directly affect tumor pro- 
gression. 

We can calculate the average time between the ap- 
pearance of successful cell lineages (Fig. 2). Not all new 
mutants are successful, because stochastic fluctuations 
can lead to the extinction of a lineage. The lineage of a 
cell with k driver mutations survives only with a prob- 
ability approximately 1 — d^/b^^ w 2sk. Assuming that 
w <C A:s <C 1, the average time between the first successful 
cell with k and the first successful cell with k + 1 driver 
mutations is given by 
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FIG. 2: Schematic representation of waves of clonal 
expansions. An illustration of a sequence of clonal expan- 
sions of cells with k —1, 2, 3 or 4 driver mutations is shown. 
Here ri is the average time it takes the lineage of the founder 
cell to produce a successful cell with two driver mutations. 
Similarly, Tk is the average time between the appearance of 
cells with k and k + 1 mutations. Equation (1) gives a simple 
formula for these waiting times, which shows that subsequent 
driver mutations appear faster and faster. The cumulative 
time to have k driver mutations grows with the logarithm of 
k. 

The introduction of subsequent driver mutations be- 
comes faster and faster. (See Appendices [C] and [d]) For 
example, for u = 10~^, s = 10~^ and T ~ 4 days it 
takes on average 8.3 years until the 2nd driver mutation 
emerges, but only 4.5 more years until the 3rd driver mu- 
tation emerges. The cumulative time to have k mutations 
grows logarithmically with k. See Table 1 for additional 
numerical examples. 

In contrast to driver mutations, passenger mutations 
do not confer a fitness advantage, and they do not modify 
tumor growth rates. We find that the average number of 
passenger mutations, n(t), present in a tumor cell after 
t days is proportional to t, that is n(t) = vt/T, where v 
is the rate of acquisition of neutral mutations. In fact, v 
is the product of the point mutation rate per base pair 
and the number of base pairs analyzed. This simple re- 
lation has been used to analyze experimental results by 
providing estimates for relevant time scales pS]. 

Combining our results for driver and passenger muta- 
tions, we can derive a formula for the number of passen- 
gers that are expected in a tumor that has accumulated 
k driver mutations (see Appendix [f| 

"^^log^logfc. (2) 

Here n is the number of passengers that were present in 
the last cell that clonally expanded. Equation (2) can be 
most easily applied to tumors in tissues in which there 
was not much cell division prior to the tumor initiation, 
as otherwise a difficult estimate of the expected number 
of passengers that accumulated in a precursor cell prior 
to tumor initiation would be required. 

To test the validity of this model, we tested it on 
two tumor types that have been extensively analyzed. 
Neither the astrocytic precursor cells that give rise 
to glioblastoma multiforme (GBM)[27] nor the pan- 
creatic duct epithelial cells that give rise to pancre- 
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10% 10-5 


58.0 32.8 23.4 18.3 

15.1 8.3 5.8 4.5 
8.3 4.5 3.2 2.5 
4.5 2.5 1.7 1.3 
1.1 0.6 0.4 0.3 


1% lo-'' 

1% 5 • 10-" 
1% 10-5 
1% 2 • 10-5 
1% 10-^ 


10.8 5.8 4.0 3.1 
9.1 4.9 3.4 2.7 
8.3 4.5 3.2 2.5 
7.6 4.2 2.9 2.3 
5.8 3.3 2.3 1.8 



TABLE I: Times between clonal waves Numerical values 
for the average time (in years) between the first successful 
cell with k and fc -I- 1 driver mutations, for different values 
of the selective advantage s and the mutation rate u. Cells 
divide every T = 4 days. The table shows that changing the 
selective advantage of drivers has a large effect on the waiting 
times, while changing the driver mutation rate has a relatively 
small effect. 



atic adenocarcinomas [28' divide much prior to tumor 
initiation [251 I3D]- Therefore the data on both tumors 
should be suitable for our analysis. Parsons et aZ. [31] se- 
quenced 20,661 protein coding genes in a series of GBM 
tumors and found a total of 713 somatic mutations in the 
14 samples that are depicted in Fig. 3. Similarly, Jones et 
al. [3^ sequenced the same genes in a series of pancreatic 
adenocarcinomas, finding a total of 562 somatic muta- 
tions in the 9 primary tumors graphed in Fig. 3. In both 
cases, we classified missense mutations as drivers if they 
were classified as such by the CHASM algorithm at false 
discovery rate (FDR) 0.2[37 (see Appendices |h] and |l| . 
We also considered all nonsense mutations, out-of-frame 
insertions or deletions (INDELs) and splice-site changes 
as drivers because these generally lead to inactivation of 
the protein products [36, . AH other somatic mutations 
were considered to be passengers. 

From Fig. 3a and b, it is clear that the experimen- 
tal results on both GBM and pancreatic cancers were 
in good accord with the predictions of equation (2). A 
critical test of the model can be performed by compari- 
son of the best-fit parameters governing each tumor type. 
It is expected that the average selective advantage of a 
driver mutation should be similar across all tumor types 
given that the pathways through which these mutations 
act overlap to a considerable degree. Setting the driver 
mutation rate to be u = 10"^ and fitting equation (2) 
to the GBM data using least squares analysis, we found 
that the optimum fit was given by s = 0.0048 ± 0.0004. 
Remarkably, using the same mutation rate in pancreatic 
cancers, we find that the best fit is given by a nearly 
identical s — 0.0050 ± 0.0005. This consistency not only 
provides support for the model but also provides evi- 
dence that the average selective advantage of a driver is 
s « 0.5%. For u = IQ-^ and u = lO'^, we get s w 0.65% 
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FIG. 3: Comparison of clinical mutation data and the- 
ory. Our theory provides an estimate for how the number of 
passenger mutations found in a tumor is related to the number 
of driver mutations. Here we show a comparison of equation 
(2) (green line) with genomic data from glioblastoma multi- 
forme (GBM) and pancreatic cancer. We also compare our 
analytic result with computer simulations. Parameter values 
used in equation (2) and computer simulations were s = 0.5% 
and u — 10~^. a, Equation (2) (green line) fitted to GBM 
data from 14 patients, b, Equation (2) (green line) fitted 
to pancreatic cancer data from 9 patients, c, Comparison of 
computer simulations and equation (2). For each k between 
2 and 10, the number of passengers that were brought along 
with the last driver in 10 tumors with k drivers is plotted, 
d, Comparison between computer simulations and equation 
(2) for selective advantage of the fc-th driver, Sk, taken from 
a Gaussian distribution with mean s and standard deviation 
a — s/2. For each k between 2 and 10, the number of passen- 
gers that were brought along with the last driver in 10 tumors 
with k drivers is plotted. 

and s « 0.32%, respectively. The fact that these esti- 
mates are not strongly dependent on the mutation rate 
supports the robustness of the model. 

In Beerenwinkel et aL|32|. we have previously mod- 
eled tumors which are slowly expanding due to some con- 
straint, using a Wright-Fisher process. The new results 
are considerably different. For example, Beerenwinkel et 
al. found that the "waiting time required to accumu- 
late k driver mutations was proportional to k, while in 
the new model the waiting time depends on the loga- 
rithm of k. Our formula for the waiting time provides a 
much better explanation for the long initial stages in the 
adenoma-carcinoma sequence |25j. 

Like all models, ours incorporates limiting assump- 
tions. However, many of these assumptions can be loos- 
ened without changing the key conclusions. For exam- 
ple, we assumed that the selective advantage of every 
driver was the same. We have tested whether our for- 
mulas still hold in a setting wherein the selective advan- 
tage of the /c-th driver is Sfc, and SkS are drawn from 



a Gaussian distribution with mean s and standard de- 
viation a = s/2. The simulations were still in excellent 
agreement with equation (2) (Fig. 3d). Similarly, we as- 
sumed that the time between cell divisions (generation 
time T) was constant. Nevertheless, equation (2), which 
gives the relationship between drivers and passengers, is 
derived without any specification of time between cell di- 
visions. Consequently, this formula is not affected by the 
possible change in T. Finally, there could be a finite car- 
rying capacity for each mutant lineage. In other words, 
cells with one driver mutation may only grow up to a 
certain size, and the tumor may only grow further if it 
accumulates an extra mutation, allowing cells with two 
mutations to grow until they reach their carrying capac- 
ity and so on. It is reasonable to assume that the carrying 
capacities of each class would be much larger than 1/m, 
which is approximately the number of cells with k mu- 
tations needed to produce a cell with fe -I- 1 mutation. 
Thus, the times at which new mutations arise would not 
be much affected by this potential confounding factor. 

Given the true complexity of cancer, our model is de- 
liberately oversimplified. Despite its simplicity, however, 
it captures some essential characteristics of the genetic 
complexity underlying tumor growth. Simple models 
have already been very successful in providing important 
insights into cancer. Notable examples include Knud- 
son's two hit model^23j and Armitage-DoU's multi-hit 
hypothesis |33j. The model described here represents the 
first attempt to provide analytical insights into the re- 
lationship between drivers and passengers in tumor pro- 
gression and will hopefully be similarly stimulating. One 
of the major conclusions, i.e., that the selective growth 
advantage afforded by the mutations that drive tumor 
progression is very small (~ 0.5%), has major implica- 
tions for understanding tumor evolution. For example, 
it shows how difficult it will be to create valid in vitro 
models to test such mutations on tumor growth; selec- 
tive growth advantages of 0.5% are nearly impossible to 
discern in cell culture over short time periods. And it 
explains why so many driver mutations are needed to 
form an advanced malignancy within the lifetime of an 
individual. 
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APPENDIX A: SIMULATIONS 

We model tumor progression with a discrete time 
Gallon- Watson branching process. In our model, at each 
time step a cell with j mutations (or j-cell) either di- 
vides into two cells, which occurs with probability bj, or 
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dies with probability dj, where bj + dj = I. In addition, after n time steps 
at every division, one of the daughter cells can acquire 
an additional mutation with probability u. The process 
is initiated by a single cell with one mutation. We set 
dj — ^(1 — sy , so that additional mutations reduce the 
probability of cell death. 

In simulations, we track the numbers of cells with j 
mutations, Nj, for j = rather than the faith 

of each individual cell. We increase the efficiency of the 
computation by sampling from the multinomial distribu- 
tion at each time step. Let Nj{t) be the number of cells 
with j mutations at time t. Then the number of j-cells 
that will give birth to an identical daughter cell, Bj, the 
number of j-cells that will die, Dj, and the number of j- 
cells that will give birth to a cell with an extra mutation, 
Mj, are sampled from the multinomial distribution with 
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(B3) 

The expression ( B3 1 is the exact formula for the average 



number of cells with j mutations in our process, which 
holds for any choice of selective advantage s and mutation 
rate u between and 1. This can be easily proven by 
substituting the solution (B3l into the equations (Bll. 



This formula can be approximated by a simpler analytical 
expression in the small s and it <C s limit, which are 
reasonable assumptions for tumor progression. 

Since h^s are strictly increasing with k, Xj{t) can be 
well approximated by its leading term behavior, 
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-[b,{l-u)rd-^b,ur^, 



for ni + n2 + = Nj{t). Then, 

Nj{t + 1) = Nj{t) + Bj - Dj + Mj_i. 



(Al) 



(A2) 



Note that in this model, all cell divisions and cell deaths 
occur simultaneously at each time step. One could define 
an analogous continuous time model, with a very simi- 
lar behavior. Simulations of the continuous time model, 
however, are much less efficient, since the updates occur 
at smaller and smaller time steps as the population size 
grows. 



APPENDIX B: AVERAGE NUMBER OF j-CELLS 

In this section, we derive an analytical expression for 
the average abundance of cells with j mutations after t 
time steps in a Galton- Watson process. Since the popu- 
lation is initiated by a single cell, it might go extinct due 
to random events. If the population survives, it exhibits 
waves of clonal expansions. 

Let Xj(n) denote the average number of j-cells after 
n time steps. The averages Xj are evolving according to 
the following system of difference equations: 

Xi{'n + 1) = 6i(2 — u)xi(n) 

Xj(n+ 1) = bj(2 — u)xj{n) + bj^iuxj^i{n), j > 1, 

(Bl) 

with initial conditions a;i(0) = 1 and Xj{0) = for j > 
1. This system of infinite number of linear differential 
equations can be solved exactly. For a single initial 1- 
cell, we obtain 



Xi{n)^M2~u)r. 



(B2) 



From this, we can find the solution for 2-cells and so on, 
arriving at the formula for the expected number of j-cells 



Xjit)^a,[b,{2-u)]\ 



(B4) 



with 



2-u 



n 



bi — bl. 



for all times t for which e^** ^ 1. In the weak selection 
limit, s ^ l/j and in the low mutation limit u <C 1, we 
can approximate (1 — s)^ « 1 — js and (2 — u) 2, to get 



1 / u y'^i 
^ V2^/ 



(B5) 



In addition, if m <C s we have [bj{2 — u)Y « [2— (1 — s)^]*. 
leading to the expression for the average abundance of 
j-cells after t time steps 
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Note that in the last equation we do not approximate 
(1 — sy with 1 — js, since for large values of t this ap- 
proximation would lead to significant errors. 



Formulas (B3 1 and (B6 ) are averages calculated consid- 



ering both trajectories of extinction and non-extinction. 
The average abundance of cells with j mutations after t 
time steps, conditioned on non-extinction, is given by 



(B7) 



where q(t) is the is the probability that the population 
goes extinct by t. We note that extinction is most likely 
an early event in the process and that it occurs when 
the population consists almost entirely of 1-cells. Thus 
we can approximate q{t) by eventual extinction proba- 
bility of a population initiated by a single cell with one 
mutation, in a process with no mutation 



q{t) « gi. 



(B8) 
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FIG. 4: Average number of cells with j mutations, as pre- 
dicted by formula (B6l. Parameter values are s = 0.01, 
u = 10"* and T = 4 days. 



FIG. 5: Average number of cells with j mutations, comparison 
of formula (B6I and simulations. Parameter values are s = 
0.01, u = 10^ T = 4 days and t = 9 years. 



In a process with no mutation, the probability that a 
population initiated by a single cell with one mutation 
goes extinct is given by 
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where the last approximation is vaHd in the small s limit. 

Finally we have the expression for the average number 
of cells with j mutations, conditioned on non-extinction, 

^^(*)-2K7^y (BIO) 

The average numbers of cells with j mutations for 
j = 1, . . . ,20, as predicted by formula (BIO I, are shown 
in Fig. SI. In Fig. S2, we plot the excellent agreement 
between these predicted averages and simulations. How- 
ever, we note that due to large fluctuations in the num- 
bers of cells with j mutations at time t (see Fig. 1 from 
the main text), the averages (BIO I are not that useful 



for describing the dynamics of the process, as they are 
biased towards realizations with large numbers of cells. 
For example, in Fig. SI we see that the average number 
of cells with 2 mutations is 1 after about 2 years. On the 
other hand, simulation suggest that the average time to 
the first successful cell with 2 mutations is approximately 
8 years for the parameter values from Fig. SI. 



APPENDIX C: THE RATE OF INTRODUCTION 
OF NEW MUTANTS 



Simulations of our Gallon- Watson process suggest that 
the times at which a new mutant with a surviving lineage 
is produced have a significant effect on the dynamics of 
the process. In this section we give an approximation for 



the average time it takes the first j'-ccU with surviving 
lineage to produce a (j -f l)-cell with surviving lineage. 



The average number of j-cells grows as 



■[5,(2 



u)Y , where r is the time measured from the 



1-9. 

appearance of the first successful j-ccU and = -^J^ 
is the extinction probability of j-cells. New (j + I)-cells 
with surviving lineages appear at rate (1 — qj+i)ubjXj, 
and we approximate the time of the appearance of the 
first ( j + 1) cell with surviving lineage, Tj, by the time 
when the total rate reaches one cell, that is 
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This leads to 



log 



U&j(l-<Jj + i) 
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bj{2-u} 



log[b,{2-u)] 



(C2) 



We consider selection and mutation rate to be small 
enough, u <C 1 and s ^ 1, so log[bj{2 — u)] ~ js. We also 
assume js ^ 1 so we can approximate (2 — (1 — s)^) « 
I + js, and thus 1 — l/[bj{2 ~ u)] « js . In these limits 
we also have — ^ ~ zttt^- Now we can write 



ubj{l-qj + i) u(j+l)- 



log 



2j^s 
+ 1)" 



(C3) 



We can further simplify this formula by noting that 
« 1 to obtain 



log 



2js 
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(C4) 
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FIG. 6: Speed of introduction of new mutants: comparison of 
formula and simulation. Comparison of predicted and simu- 
lated average time it takes the lineage of the first successful 
j-mutant to produce the first successful (j + l)-mutant, Tj, for 
different values of selective advantage s. Circles correspond 
to times obtained from simulations, and lines correspond to 
formula ( C4 1 . Parameter values are u = lO"'^ and r = 4 
days. 



If, in addition, we assume that generation time is T, we 
get 



Tj = — log 



2js 



(C5) 



The excellent agreement between formula (|C4| and 
simulations is shown if Fig. S3. 



APPENDIX D: WAITING TIME TO k 
MUTATIONS 

We also derive a formula for the average time it takes 
for the first successful fc-mutant to be produced in the 
process, t^, by assuming 



fc-i 



Substituting expression ( C4 1 for Tj , we arrive at 



^Mog^ 



We approximate the last sum with an integral as 

log ^ 

tk 



(Dl) 



(D2) 



(D3) 



which then leads to the following formula for waiting time 



T , 4fcs2 
^fc = ^ log log k. 



(D4) 



FIC. 7: Waiting time to k mutations. Comparison of pre- 
dicted and simulated average time it takes for the first suc- 
cessful fc-mutant to be produced in the process for different 
values of selective advantage s. Circles correspond to times 
obtained from simulations, and lines correspond to formula 
(D4l. Parameter values are it — 10"'' and T — 4 days. 



The comparison between formula (D4) and simulat 
is shown if Fig. S4. 



iions 



APPENDIX E: PASSENGER MUTATIONS 

Suppose now that we have a model in which there are 
two types of mutations: drivers, which confer selective 
advantage as before, and passengers, which have no influ- 
ence on the fltness of the cell. If a cell with n passenger 
mutations divides, then each of the daughter cells can 
have one additional passenger mutations with probabil- 
ity V. Since passenger mutation do not affect the fitness 
of the cell, after t time steps, each cell still alive has the 
probability 



v'\i-vy 



(El) 



to have n passenger mutations. It follows that the aver- 
age number of passenger mutations present in the neo- 
plastic cell population after t time steps is 



n{t) = tv. 



(E2) 



Note that a crucial condition for (El I to be valid is that 



the time increments must be constant, that is by time 
t each cell undergoes t cell divisions. This condition is 
not satisfied generally in continuous time branching pro- 
cesses. Note also that, while in our model only one of the 
two offsprings can acquire a driver mutation in a cell divi- 
sion, both of them can acquire a passenger mutation. The 
reason is that we safely neglected the possibility of new 
driver mutations in both offsprings, since that is roughly 
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m/2 = 0.5 X 10 ^ times less probable than acquiring a 
driver mutations in only one of the offsprings. 



APPENDIX F: DRIVERS VS PASSENGERS 



Combining our results (D4) and (E2| for driver and 



passenger mutations, we give a formula for the number of 
passengers we expect to find in a tumor that accumulated 
k driver mutations 



2s 



log 



log k. 



(Fl) 



Note that n is the number of passengers that were 
present in the last cell that clonally expanded. It is these 
passenger mutations that can be detected experimentally. 
Formula (Fl ) can only be applied to tumors in tissues in 



which there was not much cell division prior to tumori- 
genesis. 

APPENDIX G: CONTINUOUS TIME FORMULAS 

In this section we define a similar continuous time 
model and list the above analytical results in this set- 
ting. As before, we start with one cell with one driver 
mutation. In a short time interval At, a cell with j driver 
mutations can divide with probability bjAt and die with 
probability djAt. 

Let Xj (t) be the expected number of cells with j muta- 
tions alive at time t. The population is evolving accord- 
ing to the differential equations: 



dxi 
~dt 



= [hi{l -u) - di]xi, 



dxj 



— [bj{l — m) — dj]xj + bj^iUXj^i for j > 2, 



subject to the initial conditions a;i(0) = 1 and Xj(0) ~ 
for j >2. This system can be solved analytically and the 
exact solution is 



i-i 

1=1 



lg=l.q^kiibk - bq){l -u)+dg- dk) 

(Gl) 

with rk = 6fc(l ~ u) - dk- 

In order to model tumor progression, let us specify 
the rates bj and dj . Perhaps the simplest choice is to as- 
sign the same fitness advantage to each driver mutations, 
that is have a j dependent division rate bj = I + sj, 
and constant death rate dj — 1. The main problem 
with this choice is, that when substituting them into 
the general formula (Gil, it turns out that the aver- 
age number of cells becomes infinite at the finite time 
t* — — logu/[s(l — It)]. The underlying reason for this 
blowup is the presence of an infinite number of cell types. 



This artifact can be easily avoided by making each mu- 
tation decrease the death rate of cells, that is to define 
dj = (1 — sy , and to make the division rate constant 
bj = 1. The population always remains finite in this ver- 
sion of the model. Fitter cells, however, have shorter gen- 
eration times than less fit cells. Hence, at any given time 
t, different cells may have undergone different numbers 
of cell divisions. As a consequence, the expected num- 
ber of neutral mutations is not the same for all cells (in 
fact it is positively correlated with the number of driver 
mutations), hence we do not have a simple relationship 
between drivers and passengers as in the discrete time 
case. For this reason we propose the following definition 
instead. 

We define a continuous time branching process similar 
to the discrete one we use in the paper. In this process, 
an event (division or death of a cell) occurs at rate l/T. If 
an event occurs to a cell with j mutations, then it is death 
with probability ^(1 — s)"' and division with probability 
1 - i(l - sy. Thus, bj = (1 - i(l - sy)/T and dj = 

In this case, the time between the appearance of the 
first successful j-cell and the appearance of the first suc- 
cessful (j + l) cell, Tj is given by 



T . 2js 

'^3 = 77 



js uT 

The waiting time to the first successful k mutation is 



(G2) 



T , 4fcs2 , , 



(G3) 



Since the times between successive divisions of a single 
cell line are constant on average, we can use formula (E2 1 



for passenger mutations, in order to get the following 
formula for the number of passengers as a function of the 
number of drivers 



APPENDIX H: MUTATION DATA 

Parsons et al. [33] sequenced 20,661 protein coding 
genes in 22 human glioblastoma multiforme GBM tumor 
samples using polymerase chain reaction (PGR) sequence 
analysis. 7 samples were extracted directly from patient 
tumors and 15 samples were passaged in nude mice as 
xenografts. All samples were matched with normal tissue 
from the same patient in order to exclude germline mu- 
tations. Analysis of the identified somatic mutations re- 
vealed that one tumor (Br27P), form a patient previously 
treated with radiation therapy and temozolomide, had 17 
times as many alterations as any of the other 21 patients, 
consistent with previous observations of a hypermuta- 
tion phenotype in glioma samples of patients treated with 
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Gene 


Mutation CHASM score 


P-valuc 


CDKN2A 


H98P 


0.024 


0.0004 


CDKN2A 


L63V 


0.096 


0.0004 


TP53 


C275Y 


0.028 


0.0004 


TP53 


G266V 


0.024 


0.0004 


TP53 


H179R 


0.152 


0.0004 


TP53 


I255N 


0.024 


0.0004 


TP53 


L257P 


0.048 


0.0004 


TP53* 


R175H 


0.078 


0.0004 


TP53* 


R248W 


0.114 


0.0004 


TP53 


R282W 


0.126 


0.0004 


TP53 


S241F 


0.044 


0.0004 


TP53* 


V217G 


0.144 


0.0004 


TP53* 


Y234C 


0.022 


0.0004 


NEKS 


A197P 


0.268 


0.0008 


PIK3CG 


R839C 


0.258 


0.0008 


SMAD4* 


C363R 


0.240 


0.0008 


TP53 


D208V 


0.240 


0.0008 


TP53* 


K120R 


0.262 


0.0008 


TP53 


T155P 


0.202 


0.0008 


MAPT 


G333V 


0.322 


0.0021 


DGKA 


V379I 


0.336 


0.0025 


STK33 


F323L 


0.342 


0.0025 


FLJ25006 


S196L 


0.392 


0.0038 


PRDM5* 


V85I 


0.396 


0.0038 


TP53 


L344P 


0.406 


0.0050 


TTK 


D697Y 


0.426 


0.0063 


NFATC3* 


G451R 


0.464 


0.0067 


PRKCG* 


P524R 


0.444 


0.0067 


CMAS 


I275R 


0.474 


0.0071 


KRAS* 


G12D 


0.474 


0.0071 


PCDHB2 


A323V 


0.476 


0.0071 


STN2 


I590S 


0.474 


0.0071 


SMAD4 


Y95S 


0.496 


0.0092 



TABLE II: Driver mutations predicted by CHASM. Missense 
mutations found in 24 pancreatic cancer samples from Jones 
et al.|36] which are classified as drivers by CHASM at FDR of 
0.2, shown with their associated Random Forest scores and P 
values. (* denotes the missense mutations classified as drivers 
in the 9 samples used in our analysis.) 



temozolomide [3S]. After removing Br27P from consider- 
ation, it was found that the 6 DNA samples extracted di- 
rectly from patient tumors had smaller numbers of muta- 
tions than those obtained from xenografts, likely because 
of the masking effect of nonneoplastic cells in the former 
p5] . For this reason we chose only to focus on the muta- 
tion data which were taken from xenografts. From the 15 
xenograft samples, we excluded one sample(Br04X) be- 
cause it was taken from a recurrent GBM which may have 
had prior radiation therapy or chemotherapy, leaving us 
with 14 samples we used for our study. 

Similarly, Jones et al. [35] sequenced 20,661 protein 
coding genes in 24 pancreatic cancers. 10 samples were 
passaged in nude mice as xenografts and 14 in cell lines. 
For the purpose of our study, we discarded the samples 
taken from metastases, and used the 9 samples which 
were taken from primary tumors as xenografts, for con- 
sistency with GBM data. 



APPENDIX I: CHASM ANALYSIS OF MISSENSE 
MUTATIONS FOUND IN PANCREATIC 
CANCERS 



Carter et al. [33 used CHASM algorithm to analyse 
GBM missense mutations found in 22 GBM samples from 
Parsons et al [34] and classify them as either drivers or 
passengers. We carried out CHASM analysis of missense 
mutations found in the original 24 pancreatic cancer sam- 
ples [36]. 33 mutations that were classified as drivers by 
the CHASM algorithm at false discovery rate (FDR) 0.2 
are shown in Table SI. 
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